Considerations for Elasticsearch Dynamic Field Mapping
I expect to be grappling with Elasticsearch until the end of this year, unless I get tired of it and find something simpler to write about. However, that probably won't take much time, as my weight loss progress in September wasn't as good as the previous two months, so I might increase my exercise time.
Misconceptions About Dynamic Field Mapping
I have always had a stereotype that relational databases require a pre-defined schema, while NoSQL databases do not and can store data dynamically. As it turns out, I stumbled when I first encountered Elasticsearch.
First of all, although Elasticsearch supports Dynamic Mapping, it is actually not recommended to use it in a production environment. The reasons are as follows:
1. String Types Can Cause Storage Bloat
When using dynamic mapping, string types are stored by default as both a text type and a keyword type sub-field.
The text type performs tokenization and builds an inverted index, while the keyword type stores the full string for exact matching, sorting, and aggregation. This dual indexing significantly increases storage space usage.
Of course, if storage space is sufficient, using multi-fields to index the same field with multiple types to meet different query requirements is a common technique. This is a design choice that trades storage space for functional flexibility; for details, refer to Multi-fields.
2. Not All Features Support Dynamic Mapping
Not all types can be automatically handled via dynamic mapping. For example:
Geospatial fields: If you want to use
geo_pointorgeo_shaperelated geospatial query APIs, you must define them in the mapping beforehand. Even if you store JSON data that matches a geographic structure (e.g.,{"lat": 25.03, "lon": 121.56}), if it is not pre-defined as ageo_pointtype, Elasticsearch will treat it as a standardobject, making it impossible to use geospatial query functions likegeo_distance.Nested objects: If you need to perform independent queries on objects within an array, you must use the
nestedtype. Dynamic mapping will only create them as anobjecttype, causing the object fields in the array to be flattened, making correct querying impossible.Custom Analyzers: If you need specific text analysis methods (such as Chinese tokenization, synonym processing, etc.), you must explicitly specify the analyzer in the mapping; dynamic mapping will only use the default standard analyzer.
For related information, please refer to Field data types.
3. The Risk of Mapping Explosion
The official Elasticsearch documentation specifically warns about the Mapping explosion issue. If you use dynamic mapping and the data source contains a large number of different field names (e.g., user-defined fields, dynamically generated keys), it may lead to:
- An explosive growth in the number of fields in the index.
- A significant increase in memory usage.
By default, an index can have a maximum of 1000 fields; exceeding this will result in the rejection of new documents.
4. Official Recommendation: Use Explicit Mapping
The official Elasticsearch documentation recommends using Explicit mapping to specify the data type for each field. This is the recommended practice for production environments because you can fully control how data is indexed to suit specific use cases. For related instructions, refer to Mapping.
Dynamic Mapping Type Conversion Rules
The following table shows the type mapping rules for Elasticsearch under different dynamic settings. For detailed explanations, please refer to Dynamic field mapping:
| JSON Data Type | Elasticsearch Type ("dynamic":"true") | Elasticsearch Type ("dynamic":"runtime") |
|---|---|---|
null | No field added | No field added |
true or false | boolean | boolean |
double | float | double |
long | long | long |
object | object | No field added |
array | Depends on the first non-null value in the array | Depends on the first non-null value in the array |
string passing date detection | date | date |
string passing numeric detection | float or long | double or long |
string not passing date/numeric detection | text with a .keyword sub-field | keyword |
Dynamic Parameter Settings
The dynamic parameter controls whether new fields are added dynamically and accepts the following options:
true (Default)
New fields are automatically added to the mapping. Suitable for rapid testing during the development phase, but not recommended for production environments.
runtime
New fields are added to the mapping as runtime fields. These fields are not indexed but are loaded from _source and calculated on the fly during queries. The advantage is that they do not consume index space; the disadvantage is that query performance is lower, making them suitable for fields that are not queried often but are occasionally needed.
false
New fields are ignored. These fields are not indexed or searchable, but they will still appear in the _source field of the returned results. These fields are not added to the mapping and must be manually and explicitly added. This setting can prevent mapping explosion while maintaining the integrity of the original data.
strict
If a new field is detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping before they can be used. This is the strictest setting, suitable for scenarios that require strict control over data structure.
For more detailed explanations, please refer to Dynamic mapping.
Conclusion
Although Elasticsearch's dynamic mapping feature seems convenient, it is recommended to plan your schema in advance and explicitly define the types for each field in production environments. This allows you to make the best trade-offs between storage space, query performance, and functional requirements, avoiding the trouble of re-indexing later.
Change Log
- 2025-10-04 Initial document creation.
